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1 Introduction 



Poisson point processes are a cornerstone of at least two fundamental contributions of Profes- 
sor Kiyoshi Ito to Probability Theory, namely the Levy-Ito decomposition of Levy processes 
(Chapter 1 in [8j) and Ito's excursion theory [9]. The law of rare events, which stresses that 
Poisson variables arise as limiting distributions for the number of successes in a large number 
of independent trials where each trial has the same small probability of success, explains their 
prominent role amongst stochastic processes. For instance. Levy processes can be viewed as 
weak limits of rescaled random walks and their jumps correspond to rare large steps of the 
latter. Informally, the law of rare events thus suggests the Poissonian structure of the jumps of 
Levy processes, and this is indeed the core of the Levy-Ito decomposition. A related but more 
delicate heuristic also applies to Ito's description of the excursions of Markov processes, as in 
discrete time, the succession of the excursions of a Markov chain away from a recurrent point 
forms an i.i.d. sequence of paths. 
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In this paper, we shall point out the relevance of this paradigm to study a question moti- 
vated by genetics. Recall that Bienayme-Galton- Watson branching processes [H [71 [10] model 
a population in which at every generation each individual begets according to a fixed offspring 
distribution and independently of the other individuals, and then dies. Imagine that neutral 
mutations may happen, so that a child can be either a clone of its parent or a mutant, and the 
reproduction laws of clones and of mutants are identical. We shall further suppose that each 
time a mutation occurs, it produces a mutant with a genetic type (allele) which has never been 
observed before; this setting has been referred to as the infinite alleles model by Kimura and 
Crow . 

The allelic partition consists in decomposing the entire population into sub-families of in- 
dividuals carrying the same allele. One important issue in the study of random population 
models with mutations (cf. the celebrated sampling formula of Ewens [5] for the Wright-Fisher 
model) concerns statistics of this allelic partition: what is the probability of observing allelic 
clusters of certain sizes, how to describe the random genealogical structure connecting these 
clusters to each other, ... Our main concern here will be to investigate asymptotics when the 
size of the population is large (typically because the number of ancestors is large) and muta- 
tions rare. We shall see that, under some mild conditions and for an appropriate regime, a 
non-degenerate limit exists and is conveniently described in terms of a certain continuous state 
branching process in discrete time [11]. It is well-known that continuous state branching pro- 
cesses bear close connexions to certain infinitely divisible distributions; in particular we shall 
provide a representation of the limiting allelic partitions in terms of Poisson point measures 
appearing in the Levy-Ito decomposition of the jumps of an underlying Levy process. 

Let us give a rough idea of the orders of magnitude of the quantities involved. We shall 
consider a fixed reproduction law with unit mean and finite variance, and let the Galton- 
Watson process start from n ancestors having all the same genetic type. It is well-known that 
if n generations represent one unit of time and if we rescale the population at each generation 
by a factor then the rescaled Galton- Watson process converges in distribution as n tends 
to infinity to a Feller diffusion. We also suppose that neutral mutations affect each child with 
probability 1/n. The scaling between population sizes, generations and mutation rates should 
not come as a surprise since it is precisely the regime of interest for other standard population 
models, such as the Wright-Fisher model and Kingman coalescent [13]. Recall that such a 
critical Galton- Watson process becomes extinct after roughly n generations, and that the total 
population is of order n^. So there are only a few mutations at each generation and thus about 
n different alleles; furthermore the largest allelic sub-families have size of order n^. 

Our main result can be described as follows. We use the universal tree U, that is the set 
of finite sequences of integers (including the empty sequence that serves as the root of U) 
to record the genealogy of alleles, and define the tree of alleles as a random process A on U, 
such that the values at vertices are given by the sizes of the corresponding allelic sub-families, 
with the convention that sizes are ranked in the decreasing order on each sibling. We consider 
a fixed reproduction law which is critical and has finite variance, and for every integer n, 
a Galton- Watson process with this reproduction law, started from n ancestors and in which 
mutations occur at random with rate 1/n. We write A^'^^ for the process on U that describes 
the corresponding tree of alleles. Then as n tends to infinity, the rescaled tree of alleles n"^^*-"-* 
converges in the sense of finite dimensional distributions towards a process ^ on U with values 
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in (0, oo). The latter describes the genealogy of a continuous state branching process in discrete 
time with an inverse Gaussian reproduction law. We stress that its law only depends on the 
variance of the offspring distribution of the Galton- Watson process, and hence may be viewed 
as a universal tree of alleles. 

The plan of this paper is as follows. In Section 2, we first present the general setting, stressing 
the role of the general branching property for the study of Galton- Watson processes with neutral 
mutations. Then we compute explicitly reproduction laws related to allelic sub-families and 
point at a connexion with certain downward-skip-free random walks. Such questions have been 
addressed from a different point of view in [2] to which the present work can be viewed as a 
complement and a sequel. Section 3 provides some background on continuous state branching 
processes and convergence of rescaled Galton- Watson processes. The main asymptotic results, 
namely Proposition [2] and Theorem [1] are stated and then proved in Section 4. 

2 Galton- Watson processes with neutral mutations 
2.1 Basic definitions and branching properties 

In a Galton- Watson process with neutral mutations, every individual reproduces according to 
the same distribution and independently of the other individuals, no matter whether it is a 
mutant or a clone. Of course, a clone child of a mutant bears the same allele as its parent. 
Recall also that we are working in the infinite alleles setting, i.e. the same genetic type cannot 
be recovered from a cycle of mutations. Our basic data are hence provided by a pair of non- 
negative integer-valued random variables 

which describes the number of clone-children and the number of mutant-children of a typical 
individual. In this paper, we shall mainly be interested in a special situation which appears 
commonly as a model in population genetics, namely where mutations affect each child ac- 
cording to a fixed probability and independently of the other children (in other words, the 
conditional distribution of given +^^^^ = i is binomial with parameter {i,p))- However 
the first steps of the analysis can be carried on without difficulties using the general framework. 
We assume throughout this work that 

E(e(^))<l, 

i.e. the process of clones is critical or sub-critical 0; and we further implicitly exclude the 
degenerate cases when = 0, or = 0. For every integer a > 1, we denote by Pq the law 
of a Galton- Watson process with neutral mutations, started from a ancestors having the same 
genetic type and with reproduction law given by that of = {^^'^\ ^^^^)- 

The basic branching property states that for every fixed generation, conditionally on the 
number of individuals at that generation, the descents of those individuals are given by inde- 
pendent copies of the initial process, independently of the preceding generations. It is natural 

^ Note that this is weaker than assuming that E(^'^'^) + S,^™^) < 1 which was required in [5]. 
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to expect that this branching property should hold more generally for certain stopping rules, 
and that this is indeed the case will play an important role in our analysis. For the sake of 
simplicity, we shall now present such an extension in a rather informal way, referring to Chauvin 
[3] for technical details. 

The genealogy of each ancestor is conveniently described by a planar rooted tree, with edges 
connecting parents to children. More precisely, this requires an additional ordering of the 
children of each individual, and in this direction we may decide to rank siblings uniformly at 
random. A line is defined as a family of edges such that every branch from the root (i.e. the 
ancestor) contains at most one edge in that family. For instance, the edges between parents 
at generation k G and children at generation k + 1 form a line. A stopping line should be 
thought of as a random line such that for every edge in the tree, the event that this edge is part 
of the line only depends on the marks found on the path from the root to that edge. Recall that 
every edge of the tree corresponds to a pair of individuals (parent, child), and denote by Cr the 
subset of children in the family of edges of some stopping line r. By removing the edges of r, we 
disconnect the genealogical tree into sub-trees whose roots are formed on the one hand by the 
ancestor, and on the other hand by the individuals in Cr- The general branching property then 
states that conditionally on Cr, the sub-trees rooted at the individuals in Cr are independent 
copies of the initial genealogical tree, and also independent of the initial tree pruned along r. 

We now take into account mutations by assigning marks to the edges between parents and 
their mutant children. Since we are interested by the genealogy of alleles (or equivalently, of 
mutants), it is convenient to say that an individual has the k-th type if its genotype has been 
affected by k mutations, that is if its ancestral line comprises exactly k marks. Plainly, the 
family r(fc) of edges connecting a parent of the {k — l)-th type to a mutant child is a stopping 
line, and the set Cr(k) coincides with that of the mutants of the k-th type. We denote by 
the total population of individuals of the k-th type and by Mk the total number of mutants of 
A;-th type, agreeing that mutants of the 0-th type are the ancestors (so Mq = a, Pa-a.s.). The 
general branching property should make the following statement obvious; we refer the reader 
to e.g. Chapter Ten in Taib [i5\ for a rigorous argument. 

Lemma 1 Under Fa, 

(Mfc,A;GZ+) 

is a standard Galton- Watson process with reproduction law Pi (Mi G ■). More generally, 

{{Tk,Mk+i),keZ+) 
is a Markov chain with transition probabilities 

P,(Tfc = n', Mfc+i = m' \ T^^i = n, = m) = = n', Mi = m') . 

Remark. We stress the fact that the chain {Tk,k G Z+) of the sizes of sub-populations 
with given types is not Markov; nonetheless it can be viewed as a hidden Markov chain. In 
this direction, we also point out that {{Tk, Mk), k G Z_|_) is Markovian, since the transition 
probabilities of the chain {{Tk, M^+i), k G Z+) only depend on the second coordinate. Indeed, 
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by a straightforward application of the general branching property, one gets (assuming implicitly 
that the events on which we condition have positive probability) 




Next, observe that in an infinite alleles model, the genealogy of individuals naturally induces 
a genealogy for the alleles in that population. Indeed, we may identify alleles and mutants, 
which enables us to use the set of new mutants plus a root corresponding to the ancestors of 
the population (recall that we assume that all the ancestors have the same genetic type) as the 
set of vertices. We draw an edge between the root and mutants of the 1st type, and for every 
A; > 1 we also draw an edge between a mutant of the k-th type and a mutant of the (/c + l)-type 
if and only if the path connecting these individuals in the genealogical tree does not contains 
other mutants. Hence the set of alleles has a natural structure of rooted tree. Note that for 
/c > 1, Mk corresponds to the number of vertices at the k-th level in the tree of alleles. 

Our main goal in this paper is to establish asymptotic features on the genealogy of allelic 
sub-families, and in this direction, it will be convenient to view the latter as random processes 
indexed by the universal tree. More precisely, introduce the set of finite sequences of positive 
integers 



where N = {1, 2, . . .} and N° = {0}. Let us briefly recall some standard notation in this setting: 
if M = {ui, . . . , Uk) is vertex at level > in U, then the children of u are uj := {ui, . . . , Uk,j) 
for j G N. We also denote by \u\ the level of the vertex u, with the convention that the root 
has level 0, i.e. |0| = 0. We now take advantage of the natural tree structure of U to record 
the genealogy of allelic sub-families together with their sizes. 

Given a Galton- Watson process with neutral mutations, we construct recursively a process 
A = {Au : n G U) as follows. First, A0 = Tq is the size of the sub-population without mutation. 
Next, recall that Mi denotes the number of mutants of the first type. We enumerate the Mi 
allelic sub-populations of the first type in the decreasing order of their sizes, with the convention 
that in the case of ties, sub-populations of the same size are ranked uniformly at random. We 
denote by Aj the size of the j-th allelic sub-populations of the first type, agreeing that Aj = 
if j > Ml. We then complete the construction at all levels by iteration in an obvious way. 
Specifically, if Au = for some m G U, then Auj = for all j G N. Otherwise, we enumerate 
in the decreasing order of their sizes the allelic sub-populations of type |u| -|- 1 which descend 
from the allelic sub-family indexed by the vertex u, and then Auj is the size of this j-th sub- 
family (as before, in the case of ties, sub-families are ordered uniformly at random, and empty 
sub-families have size 0). See Figure 1 for an example. We call the process A = {Au : m G U) 
the tree of alleles. 

^For the sake of clarity we shall keep the name generation for the distance to the root of individuals in the 
genealogical tree, and use the name level when dealing with the structure of alleles. 
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Figure 1: Genealogical tree with mutations (left) and tree of alleles (right). The symbols 
• , 4, ^, OjJft, O represent the different alleles. The labels on the tree of alleles are the sizes 
of the corresponding allelic sub-families; sub-families with zero size (i.e. which are empty) are 
omitted. 

It is important to observe that the transition probabihties of the chain ((T^, Mk+i), k G Z+) 
in Lemma [1] depend only on the second coordinate, and that the latter alone is a Galton- Watson 
process. This suggests that the tree of alleles should enjoy some kind of branching property. 
In order to give a formal statement, is convenient to define first the (outer) degree of the tree 
of alleles A at some vertex u G U as 

du := max{j > 1 : Auj > 0} , 

where we agree that max = 0. In words, du is the number of allelic sub-populations of type 
|m| + 1 which descend from the allelic sub-family indexed by the vertex u; in particular dg, = Mi. 
We shall also need the following notation. Let 7 be a random variable in N^, > 1 an integer, 
and 7'^'^^ = (71, . . . ,7rf) where the 7^ are independent copies of 7. We then denote by 7'^'^-''^ the 
rearrangement of 7^^^^ in the decreasing order of the first coordinate, with the convention that in 
the case of ties, the variables 7j with the same first coordinate are ranked uniformly at random. 

The characterization of the probabilistic structure of the tree of alleles that we are now ready 
to present stems again easily from the general branching property by iteration. 

Lemma 2 For every integers a> 1 and k >0, the tree of alleles fulfills the following properties 
under Fa conditionally on {{Av,dy) : |f| < k): 
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(i) the families of variables 

{{Auj, duj) : 1 < j < du) , u vertex at level k such that Au > , 
are independent, 

(ii) for each vertex u at level k with Au > 0, the du-tuple {{Auj, duj) : 1 < j < du) is distributed 
as (To,Mi)(°'"^) under ¥i. 

Of course Lemma [2] of much more informative than the sole Markovian description of the 
chain ((T^, Mfc+i), k G Z+) in Lemma [1] as it retains the information about the genealogy of the 
allelic sub-families and not merely the sizes of populations of a given type. In this direction, 
observe that 

Tfc = ^Au and M^+i = ^ . 

jM|=fc |ji|=A; 

2.2 Calculation of reproduction laws 

We shall now determine the transition probabilities that appear in Lemma [TJ Essentially, this 
has been achieved recently in [2] using an approach that largely relies on Harris connexion 
between downward-skip-free random walks and standard Galton- Watson processes, extended 
to encompass the situation where neutral mutations occur. Here, we shall use a different route, 
developing calculations that involve generating functions in the case when mutants are supposed 
to be sterile. 

We denote the law of ^ = {i^^\i^'^'^) by vr = [iik/ ■.k,ieZ+), that is 
We also introduce the generating function 

oo 

g{x, y) := ^ x^V,,, = E(x«'^V^*"') , x,y e [0, 1] . 
k,e=o 

As we are interested in the joint distribution of the total number of individuals of the 0-th type 
and the number of mutants of the 1-st type, we may imagine a two-type branching process such 
that clones reproduce independently of each other according to the same distribution vr, while 
mutants are sterile, i.e. have no progeny a.s. We write ip for the generating function of the 
total population of 0-th type and the number of mutants when there is a single ancestor, i.e. 

ip{x,y) :=Ei(x^V'^), x,ye[0,l], 

so that by the branching property, the generating function of (Tq, Mi) under is The 
following result is a slight extension of Theorem l(ii) of [2] (recall that here we only assume 
that E(i^*^'^)) < 1 and have an arbitrary number of ancestors, while in |2] we worked with a 
single ancestor and assumed that E(^^'^) -|- ^(™)) < 1). It can be viewed as a generalization 
of the well-known Dwass formula [1] for the distribution of the total population in standard 
Galton- Watson processes. 



7 



Proposition 1 (i) The generating function if is determined by the equation 

(p{x,y) = xg{ip{x,y),y) , x,ye[0,l]. 

(ii) The distribution o/(To,Mi) is given by 

Fa{To = n,Mi = e) = n>a>landi>0, 



n 

where it*"' denotes the n-th convolution power of it (i.e. vr*" is the distribution of the sum of n 
i.i.d. copies of ^). 

Proof: (i) A standard application of the branching property at the first generation gives 

oo 

= xY, = h ^^"-^ = m,{x^-y^^)y 

i,j=0 

= xg{ip{x,y),y) . 
This invites us to consider the equation in the variable z E [0, 1] 

^ = i, (1) 

z X 

where x,y G (0, 1] are fixed. Our assumptions E(^^"^)) < 1 and ^^^^ ^ 1 imply that g{0,y) > 0, 
and hence limz^Q^ g{z,y)/z = oo. On the other hand, the derivative of 2; — z^^g{z,y) is 
z z~^{zdzg{z,y) — g{z,y)), and this derivative is strictly negative when z > is sufficiently 
small. This ensures that for each fixed y G [0, 1] and a; > small enough, the equation ([1]) has 
a unique solution z = (p{x,y), and this suffices to determine the law of (To,Mi). 

(ii) We shall now derive explicitly the law of (Tq, Mi) under Pq from its generating function 
(p"- using the classical Lagrange inversion formula. For each fixed y G [0,1], the function 
X g{x,y) is analytic with g{0,y) 7^ 0. More precisely, we have 

00 00 
9{x, y) = Y akiy)x^ with ak{y) := ^ y^Trk,e ■ 

According to Lagrange inversion formula (see for instance Section 5.1 in [IS]), the a-th power 
of the solution to the equation ([1]) with y G [0, 1] fixed and x > sufficiently small, can be 
expressed in the form 

00 

^^{x,y) = Y,-<-a^\ 
^ — ^ n 

n=l 

where a*" stands for the n-th convolution power of the ffnite measure a = {ak{y) '■ k G Z_|_). 
Observe that the generating function of a is a: —> g{x, y), so that of a*" is 



x^g{x,yr = J2^'lj2y'^*^i] ■ 



k=o \e=o 
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Hence we have 



and we conclude that 



oo oo 

n=l e=o 



which completes the proof of (ii). □ 

As generating functions easily yield moments of variables, one immediately deduces from 
Proposition [T] simple criteria to decide whether the number of mutant children M is critical, 
sub-critical, or super-critical, or has a finite second moment. 

Corollary 1 (i) Suppose that the mean number of clone children is sub-critical, i.e. E(^*^"^^) < 1. 
Then 

]ff('C(m)\ 



and in particular 



Further 



'1-E(e(-)) 

Ei(Mi) <J = 1 ^ E(e(=) + e^"^)) <J = 1 




Ei(M2) < oo ^ E((e(=) + e^""))^) < oo . 
(ii) //E(^(=)) = 1, then Ei(Mi) = oo. 

Proof: Recall that the first moment of an integer- valued variable is given by the left-derivative 
at 1 of its generating function. We get from Proposition [T]^i) 

oy oy ox ay 

Since v^(l, 1) = 1, this identity forces 

1^(1,1) =Ei(A'/i) = oo 
oy 

when 

g(l,l)=E(e(^)) = l 
(recall that E(^*^™)) > by assumption), whereas it entails 

El (Ml) = , 
^ ^ 1-E(^(^)) 

whenever E(,^('^^) < 1. 

Observe further that the process of the number of clone children is a branching process with 
offspring distribution given by the law of In particular, in the sub-critical case E(,^(^^) < 1, 
the total population of clones has a finite expectation given by Ea(To) = a/(l — E(,^('^))). The 
first equivalence in (i) follows readily. Similar calculations involving the second derivative of 
generating functions yield the second equivalence in (i). □ 
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2.3 Construction from a random walk 

The starting point of this section is the observation that the transition probabihties of the 
Markov chain ((T^., M^+i) : k G have a simple interpretation in terms of random walks. In 
this direction, let us first introduce some notation. We consider a sequence = {^n\^ii^^) : 
n G N) of i.i.d. variables with law tt, and then the random walk started from a > 1 and with 
steps ^^"^^ — 1, 

It is convenient to use the (slightly abusive) notation Pq for the law of {Sn^ : n G 2+). We also 
define the first hitting times 

^(j) := mf{n G Z+ : = -j} , j G Z+ , 

and 

i=l 

We stress that our basic assumption ]E(^*^'^)) < 1 ensures that the random walk S'^^-' does not 
drift to +00, and hence the passage times are finite a.s. The first identity in next lemma can 
be viewed as a two-dimensional extension of the well-known result of Otter and Dwass (see e.g. 
Section 6.2 in P3]) which relates the distribution of the total population in a Galton- Watson 
process to that of the first hitting time of of a random walk. 

Lemma 3 The pairs of random variables 

(^(0),S(0)) and {To,M,) 

have the same distribution under Fa- Further, the shifted sequence (^f(o)+i '■ j ^ ^) consists of 
i.i.d. variables with law n and is independent of (<;(0), S(0)). 

Proof: Introduce for a = 1 the generating function 

(^(x,y):=Ei(x^(°)|/^W), x,ye[0,l]. 

Because [Sn '■ n G Z+) is a downwards skip free random walk, an application of the strong 
Markov property at its first downward passage times shows readily that for an arbitrary integer 
a > 1 

E„(x^(o)y^(o)) ^ ^(x,yr, x,ye[0,l]. 

Now we return to the case a = 1; by conditioning on the first step of the random walk, we 
get the obvious identity 

^{x,y) = Ei(a;^W/W) 

oo 

= (p{x,y)^y'^TTk,e 
k,e=o 

= xg{<p{x,y),y) , 



10 



where g denotes the generating function of = ^^^^^) ■ Thus solves the equation of 
Proposition [It^i), which estabhshes our first claim. As the hitting time ^(0) is a stopping time, 
an application of the strong Markov property then yields the second assertion. □ 

Next, set Tq := ^(0), Mi := S(0) and define for every G N by an implicit recurrence 

To + ■ ■ ■ + Tfc = ^(Mi + ■ ■ ■ + Mfe) 

and 

Ml + • ■ ■ + Mfc+i = S(To + ■ ■ ■ + fk) = S(^(Mi + . . . + M,)) . 
Figure 2 below depicts these quantities. 




Figure 2: The graph of the random walk S^^'^ ; the * represent the non-zero values of the 
variables Here Mi = 2, M2 = 2, M3 = 1 and M4 = 0. 

Corollary 2 For every a > 1, the chains {{Tk, Mk+i) : k G Z_|_) and {{Tk,Mk+i) : k G Z+) 
have the same distribution under Pa. 

Proof: It is immediately checked by induction that each := To + - ■ ■+Tfc is a stopping time in 
the natural filtration (^(n))„gN generated by the i.i.d. sequence {^n '■ n & and that M^+i is 
^(rfc)-measurable. By an application of the strong Markov property, we get that {(Tk, Mk+i) : 
k G Z_|.) is a homogeneous Markov chain. More precisely, the conditional distribution of 
(T)i;,Mfc+i) given Tfc_i = t and M^ = m is that of (^(0),S(0)) under P^. Combining these 
observation with Lemmas [1] and [3] completes the proof. □ 
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More generally, we can apply Lemma [3] to construct from the i.i.d. variables ^„ a random 
process A' indexed by U with the same distribution as tree of alleles A, by making use of the 
characterization of the law of the latter in Lemma [21 To start with, the process A' fulfills the 
following two requirements. First, if .4^ = for some u E U, then A'y^^ = for all j G N. 
Second, for every vertex u e U such that A'^ > 0, the (outer) degree of A' at u, 

ci^=#{JGN:^^>0}, 

is a finite number and A'^j > if and only if j < d'^. We set A!^ = <^(0) and d'0 = S(0). Next, 
consider the increments 

A(j) := - ^(j - 1) and = - S(j - 1)) , j > 1 , 

For vertices at the first level, {{A'j,d'j) : I < j < d'^) is given by the rearrangement of the 
sequence ((A(j),5(j)) : 1 < j < d'^) in the decreasing order the first coordinate A(j) (with the 
usual convention in case of ties). We may then continue with vertices of the next levels by an 
iteration which should be obvious (but which would also be quite intricate to state explicitly). 
Figure 3 below may help visualizing the construction. 




Figure 3: Tree of alleles constructed from the random walk S^^^ and the variables ^^^^ of 
Figure 2. The labels on the vertices are the lengths of the excursions of S^^'^ above its current 
minimum, they correspond to the sizes of the allelic sub-families (again sub-families with size 
are omitted). 
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3 Background on continuous state branching processes 



Before describing our main limit results for trees of alleles, we need to develop some basic 
material about limits of rescaled Galton- Watson processes. The Levy-Ito decomposition of 
subordinators plays a crucial role for the representation of the genealogical structure of the 
continuous state limits. 

We start with the classical convergence to Feller diffusions |Sl|TT], i.e. the solutions {X{x, t),t > 
0) to stochastic differential equations of the type 

X{x,t)=x+ [ a^yX{x,s)dBs + b [ X{x,s)ds, t>0, (2) 
Jo Jo 

where x > is the initial value, 6 G M and o"^ > are parameters, and {Bt : t > 0) denotes a 
standard Brownian motion. For every n G N, consider a Galton- Watson process {zj^'' : k G Z+) 
which starts from Z^"'^ = a{n) ancestors and has reproduction law p^"'\ where p'-"^ is some 
probability measure on Z+ and a{n) a positive integer. Write 

oo oo 

m(p(")) := J^^pS"^ and var(p(")) := - m(p(")))2pS"^ 

i=0 i=0 

for the first moment and the variance of p^"'\ In the situation where 

a{n) ~ nx , m(p*^"'') — 1 ~ bn^^ and var(p*-"'') ~ o"^ as n — oo (3) 

for some x G (0, oo), 6 G M and o"^ > 0, it is well known that 

(n-izj^jj : t > 0) =^ (X(x,t):t>0) (4) 

where the notation refers to convergence in distribution as n — >■ oo and X{x,t) is the Feller 
diffusion specified by ([2]). 

We next turn our attention to the simpler situation where one only rescales the number of 
individuals and uses the generations as a discrete time parameter. For the sake of clarity, we 
shall deal with a framework that is slightly less general than it could be. We denote the tail 
distribution of p*^"^ by p^'^\y) := p^"-'((y, oo)) for y > and now assume that 

lim n~^a{n) = x and lim np^'^\ny) = z/(?/) in Ll^^{[0 , oo) , dy) , (5) 

n— >oo n— >oo 

where z/ is some locally integrable non- increasing function on [0, oo) with z/(oo) = 0. We may 
thus think of z/ as the tail of a Radon measure u on (0, oo) with /(I A y)v{dy) < oo; v will be 
often referred to as a Levy measure. Our assumptions ensure that 

n-^Z^^ =^ Z, , (6) 

where Z\ is a random variable with values in [0, oo) which is infinitely divisible. Indeed, we 
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have for any g > 



and ensures that the latter quantity converges as n — »• oo towards the Laplace transform of 
an infinitely divisible variable 

E(exp(-gZi)) = exp{-XK{q)) , 
where the cumulant k is given by the Levy-Khintchine formula 

K{q)= [ (l-e-^^)z/(dy). (7) 
^(0,00) 

We underline the fact that the drift coefficient is 0; this will play an important role in the 
sequel. An application of the Markov property now shows that more generally 

(n-^zf^ ■.kez+) =^ {Zk-.ke Z+) (8) 

where {Zk : k E Z+) is a Markov chain with values in R+, started from Zq = x and whose 
transition probabilities are characterized as follows: for every k G Z+ and q,y > 0, 

E(e-''^^+i I Z, = y) = exp(-yfi:(g)) . 

One refers to the limiting chain Z as a (discrete time) continuous state branching process, in 
short CSBP, with reproduction measure v and started from x. 

It is interesting to recast the preceding convergence in the framework of the law of rare events. 
In this direction, recall that the Levy-Ito decomposition of the infinitely divisible variable Zi 
reads 

00 

Zl = Y,^^, (9) 

1=1 

where ai > a2 > . . . are the atoms ranked in the decreasing order of a Poisson random measure 
on (0, 00) with intensity xz/, with the convention that atoms are repeated according to their 
multiplicity and that when the Poisson random measure is finite (which occurs if and only 
if z/((0, cx))) < 00), then = whenever the index i exceeds the total mass of the Poisson 
measure. Consider for every n G N a family (^^^"^ : 1 < « < a{n)) of i.i.d. variables with law p*^"); 
we should think of ^j"'' as the number of children of the i-th. ancestor in the Galton- Watson 
process Z^"^ . Denote by a["'' > ag"'' > . . . > a^?\ the decreasing reordering of the rescaled 



[0,00) 



a{n) 



n Jo J 

POO 

l-q e~'^y pn{ny)dy 
Jo 



a{n) 



a(n) 
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variables (n ^^^"^^ : 1 < ^ < a{n)). In the regime ([S]), the law of rare events for null arrays (e.g. 
Theorem 14.18 in |T2]) ensures that 

(ai"\a("\...,a5)))^ (ai, a^, . . .) , (10) 

in the sense of finite dimensional distributions. Note also that ([6]) can be re-written in this 
setting as 

a(n) 

1=1 ieN 

however the latter does not follow from (fTOl) . 

This invites us to describe the convergence of rescaled Galton- Watson processes to (discrete 
time) CSBP from another point of view that takes into account the genealogy, and not merely 
the total sizes of populations at given generations. In this direction, we use a representation 
of the latter as random processes indexed by the universal tree U. For simplicity, suppose for 
a while that the Levy measure u is infinite, so a Poisson random measure with intensity cu 
with c > has infinitely many atoms a.s. Recall from the Levy-Ito decomposition iQ that 
almost all the individuals at the first generation in a CSBP descend from only countably many 
ancestors (we stress again that we are dealing with cumulants k with no drift component), 
and plainly the same feature holds for the subsequent generations. Roughly speaking, vertices 
M G U at level |m| = > 1 represent the sizes of the sub-populations at generation k in the 
CSBP which descent from the same parent at generation k — 1. We construct a random process 
{Zu : M G U) related to the CSBP Z, where Zuj is the size of the j-th largest sub-population 
at generation \u\ + 1 which descents from a parent in the sub-population represented by u. We 
stress the process Z is by definition non-increasing on each sibling, i.e. the map j — > Zuj is 
non-increasing on N for every m G U. More precisely, conditionally on Z^ = z, the Levy-Ito 
decomposition ([9]) suggests that {Zuj : j G N) should be given by the sequence of the atoms of 
a Poisson random measure on (0, oo) with intensity zu, where atoms are repeated according to 
their multiplicity and ranked in the decreasing order. We make the construction formal in the 
following definition. 



Definition 1 Fix x > and v a measure on (0, oo) with /(I A y)v{(\.y) < oo. A tree-indexed 
CSBP with reproduction measure v and initial population of size x is a process [Z^ : m G U) with 
values in M+ and indexed by the universal tree, whose distribution is characterized by induction 
on the levels as follows: 

(i) Z0 = x a.s.; 

(ii) for every k G Z+, conditionally on {Z„ : f G U, |f | < k\ the sequences {Zuj)jeN for the 
vertices u at generation \u\ = k are independent, and each sequence {Zuj)j^n is distributed 
as the family of the atoms of a Poisson random measure on (0, 00) with intensity ZuV , where 
atoms are repeated according to their multiplicity, ranked in the decreasing order, and completed 
by an infinite sequence of if the Poisson measure is finite. 

It should be plain that if 2 is a tree-indexed CSBP with reproduction measure v and initial 
population of size x, then ( X]|u|=fc : ^ € ^+ ) is a CSBP with reproduction measure v started 
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from X. We also point out that for every integer n, we can represent similarly the genealogy 
for the Galton- Watson process Z^"^ as a process Z'^'^'^ indexed by the universal tree U, and one 
can check that under the regime ([5]), the following extension of ([8]) holds: 

in the sense of finite dimensional distributions. This should be viewed as a variation of the law 
of rare events flTU]) : the easy proof is left to the interested reader. 

We now conclude this section by underlying the connexion between discrete time CSBP and 
subordinators (i.e. Levy processes with values in M_|_). Consider a subordinator r = (r^ : t > 0) 
with no drift and Levy measure v. Its cumulant k is given by the Levy- Khint chine formula ([7]) 
and we have 

E(e-'^"'*) = exp(-tK(g)) , for all g, t > . 
Fix X > and define a sequence {(k '■ k & Z_|_) by implicit iteration as follows: 

Co = a; , Ci = Tx, Ci + C2 = Tx+Ci 5 • • • 5 Ci + ■ ■ ■ + Cfc+i = Tx+(i+-+(k ■ 

Observe by an easy induction that the random times x + (i + ■ ■ ■ + (k are stopping times in the 
natural filtration of r, so that the strong Markov property can be applied. It is then immediate 
to check that (Cfc : A; G Z+) is a CSBP with reproduction measure u and initial population of 
size X. 

More generally, the tree-indexed CSBP Z can be constructed from the subordinator r by 
making full use of the Levy-Ito decomposition. Specifically, we know from the latter that the 
Stieltjes measure dr on the random interval 

4 := (x + Ci + ■ ■ ■ + Cfc-i, x + Ci + ■ ■ ■ + Cfc] 

is purely atomic, and conditionally on the sequence of the atomic masses has the same 
distribution as the family of the atoms in a Poisson point measure on (0, 00) with intensity 
|/fc|z/. These atoms should be viewed as the sizes of sub-families at level k, so it remains to 
identify the siblings and rank atoms corresponding to a same sibling in the decreasing order. 
This is straightforward for the first levels but becomes increasingly intricate for larger levels. 
Specifically, we let Z0 = x and declare that {Zj : j G N) is given by the sequence of the jumps 
of T on (0, x] ranked in the decreasing order. Next {Zij : j G N) corresponds to the ranked 
sequence of the jumps of r on the interval {x,x + tzi], {Z2j : j G N) to those on the interval 
(x + Tzj^, X + r^i+^j] so on. The algorithm may be thought of as a variant of the breadth 
first search in which each sibling is ordered according to the size of its progeny. 

4 Asymptotic for rare mutations 

This section contains our main results on limits of trees of alleles; we shall first present and 
discuss the general framework, then state the results, and finally prove the latter. 
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4.1 Framework and main results 

We consider a fixed probability measure n^^'' on Z+ which serves as reproduction law for a 
standard Galton- Watson process denoted by Z^+l We assume that Z^~^^ is critical, i.e. 

oo 
i=0 

and has a finite variance 

oo 

^(^-l)27rW = a2<oo. 

i=0 

Further, we suppose that mutations affect each child according to a fixed probability p G (0, 1) 
and independently of the other children. That is to say that the probability measure tt on 
Z+ X Z+ which gives the law of the number of clone children and the number of mutant 
children of a typical individual is given by 

We will use the notation for the probability measure under which the Galton- Watson process 
Z^~^^ has a ancestors and the mutation rate is p, and C (■, P^) will then refer to the distribution 
of a random variable or a process under P^ 

We are interested in the situation where the mutation rate p = p{n) is small and the number 
of ancestors a = a{n) large when the parameter n goes to infinity. Specifically, we consider the 
regime 

a{n) ~ nx and p{n) ~ cn~^ , (11) 

where c, x are some positive constants. Let us start by mentioning some results of convergence 
in distribution for Galton- Watson processes in this setting. 

First, we know from (jl]) that the Galton- Watson process properly rescaled converges 
to a Feller diffusion on IR+; specifically 

where {xj:^^ '■ t > 0) solves the SDK ([2]) for the parameter 6 = 0. In the same direction, the 
marginal law of ^^^^ under pp(") has first moment 1 — p{n) and variance close to cr^ when n 
is large. Hence, if Z^^^ denotes the Galton- Watson process of clones (i.e. we only consider 
individuals of the 0-th type), then 

C ((n-^Z[^)j : t > 0),rj^) =^ (Xf ) : t > 0) as n - oo, (13) 

where {X^^^ : t > 0) is another Feller diffusion solution to the SDE ([2]) for the parameter b = — c. 

On the other hand, recall from Lemma [1] and Corollary [T](i) that the process of the number of 
mutants of given types (M^ : k G Z^.) is a critical Galton- Watson process with finite variance. 
In view of the classical limit theorem stated as (jl]) in Section 3, one might suspect that the 
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rescaled process {n~^Mirit\ '■ t > 0) could converge to some Feller diffusion. However this is 
not the case; indeed an easy calculation shows that the variance of the reproduction law of 
M. under pp(") is of order n, and thus the requirement ([3]) fails. Nonetheless one can deduce 
from a few lines of calculations based on Proposition [T] that the condition is fulfilled by the 
reproduction law of M under P|j"^), and hence 

c[{n-'M,):keZ^),F:t:!)) 

converges weakly when n — oo towards the law of some discrete time CSBP started from a. 
We do not give a formal statement as the forthcoming Proposition [2] is a stronger result. 

The asymptotics (fT2l) and (fT3l) point to the fact that in the regime (11 II) . the total size of the 
population of the Galton- Watson process should be rescaled by a factor n~^, and in particular 
the asymptotic behavior of the number Tq = Xlfelo ^k"^ individuals of 0-th type is given by 

J 

More generally, we have the following joint convergence in distribution for the rescaled process 
of the sizes of sub-populations and the number of mutants of a given type. 



Proposition 2 In the regime ( fTTI) . vje have 

/:(((n-2Tfc,n-iMfc+i):fcGZ+),P;;g) =^ ((Z^+i, cZ^+i) : A; G Z+) 
where {Z^ : k G Z_|_) is a CSBP with reproduction measure 

and initial population of size x/c. 



The Levy-Ito decomposition now suggests that conditionally on n~'^T}. ~ the sequence of 
the sizes of the sub-populations carrying a same allele of the {k + l)-type and normalized by 
a factor should converge in distribution to the sequence of the atoms of a Poisson random 
measure on with intensity specified in Proposition [21 Recall also that du denotes the outer 
degree at the vertex m G U in the tree of alleles, and observe from Lemma [2] that for a Galton- 
Watson process with neutral mutations, the process du) : m G U) has a simpler Markovian 
structure than {Au : m G U) alone. This leads us to our main limit theorem for the tree of 
alleles. 

Theorem 1 In the regime ffTTl) . the rescaled tree of alleles n~'^A under converges in the 
sense of finite dimensional distributions to the tree indexed CSBP [Z^ : m G U) with repro- 
duction measure v given in Proposition^^ and random initial population with inverse Gaussian 
distribution: 

P(Z0 G dy) X f {cy-xf^ 



exp r-^ , 1/ > . 
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More precisely, if we also take into account the outer degrees, then we have the joint conver- 
gence in the sense of finite dimensional distributions: 

c(^{{n-^Au,n-'du):ueiJ),rJ^) =^ {Z^, cZu) : u e V) . 

4.2 Proofs 

Let us first present informally some intuitions for the proofs, which rely on the connexion with 
random walks in Section 2.3. Roughly speaking, we shall observe that in the regime (ITT]) , the 
random walk S^^^ suitably rescaled converges to a Brownian motion with negative drift. As 
the lengths of the excursions of S^'^^ above its current minimum correspond to the sizes of sub- 
populations with the same allele, this suggests that in the limit, the lengths of the excursions of 
a Brownian motion with drift above its current minimum should describe the limit of rescaled 
sub-populations. According to Ito's excursion theory, these lengths can be described in terms 
of a Poisson point process. The comparison with the construction of the tree indexed CSBP 
presented in Section 3.2 should then make Theorem [1] more intuitive. 

The proofs of Proposition [2] and Theorem [T] both rely on the following technical lemma. 
Lemma 4 In the regime (11 II) . we have: 

(i) Let {tx : X > 0) be a inverse Gaussian subordinator with cumulant 

K{q) = a-^ {^c^ + 2qa^ " c) = c'^ (1 - e-''^)z/(dy) , g>0, 
i.e. with zero drift and Levy measure c~^v where v given in Proposition^ Then 

£((n-2To,n-iMO,P:;:;) =^ {rx.cTx). 

(ii) The behavior of the joint tail distribution of Tq and Mi under p^*-"^ is given by 

lim nPf^"^ [n^'^To > t or n'^Mi > m) = c"^z/(min(t, m/c)) m Ll^(R+ x M+, dt dm) , 

n— ♦oo 

where v denotes the tail function of the Levy measure v. 

Proof: One could establish these limits from the explicit expressions in Proposition [TJ however 
a probabilistic argument based on the construction in Section 2.3 circumvents the somewhat 
tedious calculations. 

(i) Recall that the fixed reproduction law 7r*^+) has unit mean and variance cr^. For each 
n G N, consider a random walk (S'^"'' : k G started from Sq"^ = a{n) and with step 
distribution that of (^^^ — 1. By Donsker's invariance principle and Skorohod's representation, 
we may suppose that with probability one 

lim n'^S'fk.s = X + aBt , 
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where {Bt : t > 0) is a standard Brownian motion and the convergence holds uniformly on 
every compact time-interval. 

For every fixed n, we now decompose each variable ^f"*""* as the sum ^^^^ = ^l^^'' + by 
using a Bernoulli sampling; that is conditionally on = £, has the binomial distribution 
with parameter {i,p{n)). Of course, we use independent Bernoulli sampling for the different 
indices i, so that the pairs (^i"^"'', ^i™""*) are i.i.d. and have the law of ^ under pp^"-). If we define 

then E(^[™"'') = p{n) ~ c/n and var(,^|™"'') = 0{l/n), and it is easy to verify that with 
probability one 

lim n~^Sl^^} = ct , 
uniformly on every compact time-interval. Hence the random walk 

^(-) := a(n) + + ■■■ + = - 

fulfills 

lim n'^Sllf,. = X + crBt - ct , 

where again the convergence holds a.s., uniformly on every compact time-interval. 
Now recall the framework of Section 2.3 and introduce 

?("){0) 

,(-)(0) := inf{fc G Z+ : S^^ = 0} and S(-)(0) := ^f"^ = ^^l^^i)- 

1=1 

It follows readily from the preceding observations that with probability one 

lim n-\(")(0) = and lim n-^S(")(0) = cr^ 

n—*oo n—*oo 

where r denotes the process of first passage times for a Brownian motion with drift, 

Ty := inf{t >0:ct- oBt > y} , y > 0. 

It is well-known that the latter is a subordinator with cumulant k as given in the statement, 
and the first claim is established by an appeal to Lemma [3l 

(ii) The branching property shows that the law of (Tq, Mi) under P^|"| is that of the sum 

of a{n) i.i.d. variables distributed as (To,Mi) under P^^"-*. This observation enables to deduce 
(ii) from (i) by an argument similar to that we use to establish ([6]). Indeed, write 

fin{t, m) = P?^"^ (To > t or Ml > m) 

for the bivariate tail distribution of the pair (Tq, Mi) under F^^^K By an elementary calculation, 
we have that for every q,r > 



El"'(-p(-^r„-^M; 

/OO POO \ I 

/ e"^*e~™/2„(n^t, nm)dt dm | 
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We know from (i) that this quantity converges as n ^ oo towards 

E(exp(-(g + cr)r^)) = exp (^~x ^ (1 - e-(''+'^")^)c-V(d?/) 
so that taking logarithms, we get 

/•oo /"OO 

hm gra„ / / e~''*e~'''"/i„(n^t, nm)dt dm 
Jo Jo 

X I (1 -e-('?+'=")^)c~V(d?/) 

/»00 /"OO 

xqr / / e~'^*e~™c~^z/(min(t,m/c))dtdm. 
Jo Jo 

This entails our claim. □ 

Proposition [2] immediately follows from Lemma [Hand Lemma |l](i), so we turn our attention 
to the proof of Theorem [H 

Proof of Theorem [It Recall that A0 = Tq and (i^ = Mi. On the one hand, we know from 
Lemma m^i) that 



On the other hand. Lemma |l](ii) and the law of rare events for null arrays (e.g. Theorem 14.18 
in [12]) entails that for any sequence of integers b{n) such that b{n) ~ bn for some 6 > 0, 

/:((n-2To,n-iMi)(''(")^),P?^"^) =^ ((ai,cai),(a2,ca2),...), 

where the notation 7^'^^-) has been defined just before Lemma [2] and (ai, a2, . . .) stands for the 
sequence ranked in the decreasing order of the atoms of a Poisson measure on (0, 00) with 
intensity hc~^v. 

Denote by the law of a tree-indexed CSBP and initial population distributed as and 
reproduction measure v. We now see from Lemma [2] that 

C : \u\ < l),^^)) =^ C{{{Au,cAu) : \u\ < 1),Q,) 

in the sense of finite dimensional convergence. Lemma [2] enables us to iterate the argument to 
the subsequent levels of vertices, which establishes our claim. □ 
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